MOLECULAR ECOLOGY 



Molecular Ecology (2014) 23, 4035^058 



doi: 10.1111/mec.l2810 



A high-density linkage map enables a second-generation 
collared flycatcher genome assembly and reveals the 
patterns of avian recombination rate variation and 
chromosomal evolution 

TAKESHI KAWAKAMI,*'' LINNEA SMEDS/^ NICLAS BACKSTROM,* ARILD HUSBY,t^ 
ANNA QVARNSTROM,! CARINA F. MUGAL,* PALL OLASON*^ and HANS ELLEGREN* 
*Department of Evolutionary Biology, Evolutionary Biology Centre (EBC), Uppsala University, Norbyvagen 18D, SE-752 36 
Uppsala, Sweden, -fDepartment of Animal Ecology, Evolutionary Biology Centre (EBC), Uppsala University, Norbyvagen 18D, 
SE-752 36 Uppsala, Sweden 

Abstract 

Detailed linkage and recombination rate maps are necessary to use the full potential 
of genome sequencing and population genomic analyses. We used a custom collared 
flycatcher 50 K SNP array to develop a high-density linkage map with 37 262 markers 
assigned to 34 linkage groups in 33 autosomes and the Z chromosome. The best-order 
map contained 4215 markers, with a total distance of 3132 cM and a mean genetic dis- 
tance between markers of 0.12 cM. Facilitated by the array being designed to include 
markers from most scaffolds, we obtained a second-generation assembly of the fly- 
catcher genome that approaches full chromosome sequences (N50 super-scaffold size 
20.2 Mb and with 1.042 Gb (of 1.116 Gb) anchored to and mostly ordered and oriented 
along chromosomes). We found that flycatcher and zebra finch chromosomes are 
entirely syntenic but that inversions at mean rates of 1.5-2.0 event (6.6-7.5 Mb) per My 
have changed the organization within chromosomes, rates high enough for inversions 
to potentially have been involved with many speciation events during avian evolution. 
The mean recombination rate was 3.1 cM/Mb and correlated closely with chromosome 
size, from 2 cM/Mb for chromosomes >100 Mb to >10 cM/Mb for chromosomes 
<10 Mb. This size dependence seemed entirely due to an obligate recombination event 
per chromosome; if 50 cM was subtracted from the genetic lengths of chromosomes, 
the rate per physical unit DNA was constant across chromosomes. Flycatcher recombi- 
nation rate showed similar variation along chromosomes as chicken but lacked the 
large interior recombination deserts characteristic of zebra finch chromosomes. 
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Introduction 

At a time when draft genome sequencing and assembly 
are practicable for most study organisms (Ellegren 
2014), other types of critical genetic information may 
represent limiting steps in population and evolutionary 
genetic studies. One such factor is detailed linkage 
maps and the associated inference of how the rate of 
recombination varies across the genome (Dumont & 
Payseur 2008). Linkage maps enable anchoring and 
ordering of scaffolds along chromosomes (Heliconius 
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Genome Consortium 2012; Huang et al. 2013). This is 
necessary for making full use of the unprecedented 
power provided by next-generation sequencing technol- 
ogy, which, in the absence of physical mapping 
approaches (like BAC-end sequencing and fingerprint- 
ing), does not provide chromosome sequences. More- 
over, recombination is a critical parameter in governing 
the degree and nature of intraspecific diversity as well 
as interspecific divergence. For example, the rate of 
recombination is expected to correlate positively with 
local levels of nucleotide diversity (McGaugh et d. 2012; 
Cutter & Payseur 2013; Campos et al. 2014) and with 
the rate of adaptive evolution (Presgraves 2005; Campos 
et al. 2014), and there is an increasing awareness that 
recombination moulds the evolution of base composi- 
tion via GC-biased gene conversion (Duret & Arndt 
2008; Webster & Hurst 2012). Also, recombination may 
be a critical factor in shaping the genomic landscape of 
species differentiation (Butlin 2005; Renaut et al. 2013). 

Large pedigrees are needed for the development of 
Unkage maps, and obtaining such samples can be chal- 
lenging for many nonmodel species. Species that are 
difficult to breed in captivity and /or to monitor and 
sample in natural settings, or which have long genera- 
tion times and/or small litter sizes, are examples of 
organisms that may be problematic in this context. 
Unfortunately, this applies to many natural populations 
of species of relevance in ecological or evolutionary 
research. However, in natural populations of birds, 
acquiring pedigree material is greatly facilitated in spe- 
cies that readily accept breeding in artificial nest boxes 
and display high site fidelity. This is the case for our 
study species, the collared flycatcher {Ficedula albicollis), 
and it has also made it to be one of the most well-stud- 
ied avian models for questions such as life history evo- 
lution, quantitative genetics and speciation (EUegren 
et al. 1996; Gustafsson et al. 1995; Veen et al. 2001; 
Qvarnstrom et al. 2006; Saether et al. 2007; Qvarnstrom 
et al. 2010; Sastre & Saether 2010; EUegren et al. 2012). 

Available evidence, notably from chicken G. gallus 
(ICGSC 2004; Groenen et al. 2009) and zebra finch Taeni- 
opygia guttata (Stapley et al. 2008; Backstrom et al. 
2010a), indicates an unusual heterogeneity in the rate of 
recombination within avian genomes. One determinant 
of this variation comes from the fact that bird chromo- 
somes differ considerably in size. Moreover, data from 
zebra finch and to some extent also chicken show a 
very strong bias for recombination in larger chromo- 
somes to be concentrated to end regions (Groenen et al. 
2009; Backstrom et al. 2010a). It is not yet known what 
the underlying mechanism or evolutionary force driving 
such pattern might be, or whether it is in fact a general 
feature of bird chromosomes. Another characteristic of 
avian genomes is an unusual stability of the karyotype 



(Griffin et al. 2007; EUegren 2010). The majority of spe- 
cies have about 40 pairs of chromosomes, and inter- 
chromosomal rearrangements are rare (EUegren 2013). 
For example, only one fusion and one fission event sep- 
arate the chicken and zebra finch karyotypes despite 
the fact that these species represent two of the most 
divergent lineages of contemporary birds (Warren et al. 
2010). However, there are indications that intrachromos- 
omal rearrangements occur more frequently (Skinner & 
Griffin 2012), although the rate and more precise pat- 
tern of this remains to be revealed. 

Here we present the development of a high-density 
genetic linkage map of the collared flycatcher based on 
genotyping with a 50 K (50 000) SNP array in a 
multi-generation pedigree of >600 birds from a natural 
population. This effort was motivated from several 
perspectives. First, having recently generated a draft fly- 
catcher genome assembly (EUegren et al. 2012), we were 
keen to confidently be able to place, order and orient scaf- 
folds along chromosomes and thereby arrive at an assem- 
bly with essentially continuous chromosome sequences. 
The strategy for achieving this was based on designing 
an array with SNPs from the majority of all scaffolds, 
with the aim to place these onto a linkage map. Second, 
with the access to an updated genome assembly together 
with detailed information on recombination fractions 
between markers, we wanted to investigate the recombi- 
nation landscape in an avian genome at high resolution. 
Third, with the access to a short read, shot gun-based 
genome assembly with unusual continuity, we sought to 
reveal the character of avian chromosomal evolution by 
making a high-resolution comparison of flycatcher gen- 
ome organization with the only two avian genomes phys- 
ically assembled (using cosmid and BAG clones) and 
sequenced with Sanger technology, that is chicken (IC- 
GSC 2004) and zebra finch (Warren et al. 2010). 

Material and methods 

Specimens 

Blood samples were collected from collared flycatcher 
(n = 655) families breeding on the Baltic Sea island Oland 
(56°44'N 16°40'E) from 2002 to 2011. The pedigree con- 
sisted of four generations: 204 individuals in the parental 
generation and 451 F1-F3 progenies (Supporting informa- 
tion Fig. SI). DNA was extracted from blood samples 
using a standard proteinase K digestion /phenol-chloro- 
form purification protocol (Sambrook et al. 1989). 

Genotyping with a 50 K SNP array 

An Illumina 50 K SNP array for collared flycatcher has 
recently been developed by selecting markers from 
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>10 million SNPs identified in genomic resequencing of 
10 unrelated collared flycatchers (from our study popu- 
lation) and 10 pied flycatchers Ficedula hypoleuca (Kawa- 
kami et al. 2014). The bulk of markers were chosen 
based on a number of criteria set to maximize the use- 
fulness in collared flycatchers, including polymorphism 
level in the sequencing sample, even distribution across 
the genome as judged by comparative map information 
vis-a-vis the zebra finch linkage map and, if possible, 
inclusion of at least two SNPs from all scaffolds >25 kb 
in a preliminary genome assembly version. Five thou- 
sand markers on the array were selected to represent 
potentially fixed differences between the two sister spe- 
cies and were thus generally less informative for intra- 
specific analyses. 

Genotyping was done with an Illumina iScan instru- 
ment. Markers that failed to pass the quality filtering 
for genotype calling were removed from subsequent 
analysis. Deviation from Hardy-Weinberg equilibrium 
(HWE) was tested for in the parental generation using 
PLiNK version 1.07 (Purcell et al. 2007). After filtering out 
SNPs deviating from HWE, Mendelian inheritance was 
inspected for the remaining markers using Genotype- 
Checker (Paterson & Law 2011). In total, 38 900 markers 
were polymorphic in the pedigree, of which 37 443 seg- 
regated with a minor allele frequency (MAE) >0.05. 
Among these, there were 845 putative Z-linked mark- 
ers. The low proportion of loci with rare alleles illus- 
trates the value of selecting markers based on prior 
information of polymorphism levels, in this case from 
whole-genome resequencing, in the same population. 

The inheritance analysis revealed 89 individuals with 
at least one marker that did not follow Mendelian pat- 
terns. As extra-pair paternity (EPP) is known to occur 
frequently in the collared flycatcher (Sheldon & EUegren 
1999), individuals with a high proportion of markers 
deviating from expected Mendelian segregation likely 
result from EPP. We therefore removed 46 individuals 
in which >100 markers showed inconsistent inheritance. 
The remaining 43 individuals (of the 89 individuals 
with >1 error) had 1-15 markers with Mendelian incon- 
sistency and were retained; however, the inconsistent 
markers (181 in total) were removed from the subse- 
quent analysis in all individuals. In the end, we used 
genotype data from 609 individuals and 37 262 markers 
for linkage analysis. The average number of informative 
meioses in the pedigree across all markers was 187. 

Linkage analysis 

A genetic linkage map of collared flycatcher was con- 
structed using an improved version of cri-map 2.503 
(Green et al. 1990) developed by Ian Evans and Jill 
Maddox and implementing the CRI-GEN package 



provided by Xuelu Liu and Michael Grosz (Monsanto, 
St. Louis, MO, USA). A detailed account for the differ- 
ent steps in the construction of the map is described in 
the Supporting information (Appendix SI). These 
included calculating pairwise LOD scores using TWO- 
POINT and the formation of linkage groups using 
AUTOGROUP. BUILD was used for making best-order 
linkage maps. 

An updated genome assembly based on high-density 
genetic linkage data 

Markers incorporated in the genetic map were mapped 
to FiCALB_1.4 with BWA (Li & Durbin 2010). Discrepan- 
cies in the form of scaffolds including markers from 
more than one linkage group were indicative of scaffold 
chimerism in the assembly and were corrected as 
described in Supporting information (Appendix SI). The 
ends of all new scaffolds were scanned for mate-pair 
reads with their mate on a different scaffold end, repre- 
senting a means for using mate-pair information that the 
assembler had failed to automatically integrate in the 
scaffolding process. This was done separately for each 
mate-pair library described in EUegren et al. (2012), 
which had insert sizes of 2.4, 4.1, 5.1, 18 and 21 kb, 
respectively, where insert size conservatively was let to 
define the length of what was considered as the scaffold 
end. For each end, links were sorted and counted, and 
the paired scaffold with most hits was considered for 
possible adjacency. To infer a physical connection 
between two scaffolds, we then applied a reciprocal cri- 
terion requesting that the number of links to the poten- 
tial neighbour had to be higher than the number of links 
to the second and third best hit together. We refer to 
scaffolds connected in this way as super-scaffolds. The 
new assembly was named ncALBl.5 and is deposited in 
GenBank under the accession no. AGTO02000000. 

Avian karyotypes are notoriously difficult to resolve 
due to the very large number of minute microchromo- 
somes; most birds have In ^ 80 with the size of about 
half of the chromosomes <10 Mb. Only the chicken 
karyotype has been well characterized (Masabanda et al. 
2004). This, coupled with the observation of a very high 
degree of synteny conservation among birds (EUegren 
2013), has led to a convention in avian genome sequenc- 
ing efforts of numbering chromosomes according to 
homologous chicken chromosomes, even if this does 
not exactly match decreasing physical size in the focal 
species (Warren et al. 2010). Treatment of fusions or 
fissions can be illustrated by the nomenclature 
adopted in the zebra finch genome sequencing project 
(Warren et al. 2010), the second avian genome to be 
sequenced. For example, chicken chromosome 4 corre- 
sponds to two chromosomes in zebra finch, the result 
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of a fusion in the galliform lineage. In zebra finch, these 
chromosomes are referred to as 4 (the larger) and 4A. 
We have followed this practice as a useful nomencla- 
ture for flycatcher chromosomes, and this was not least 
motivated by the observation that flycatcher and zebra 
finch chromosomes were completely syntenic, without 
strong evidence for interchromosomal rearrangements. 

Analyses of chromosomal rearrangements 

Ordered and oriented flycatcher scaffolds were concate- 
nated into chromosome sequences with an arbitrary 
gap size of 5 kb. Repeat-masked flycatcher, zebra finch 
(taegut3.2.4) and chicken (washuc2) assemblies were 
aligned with progressiveMauve (Darling et al. 2010) with 
default settings, one chromosome at the time. Anchors 
including all three species were extracted from the 
backbone file and given as input to grimm (Tesler 2002), 
to be grouped in syntenic blocks with the minimum 
block size set to 50 kb, unless otherwise stated, mgr 
(Bourque & Pevzner 2002) was then used for inferring 
rearrangement events between species, which was 
essentially only in the form of inversions (see Support- 
ing information Appendix SI). 

Flycatcher chromosome sequences were also aligned 
to zebra finch only with lastz (Harris 2007). Anchors 
that overlapped in either of the genomes were filtered, 
saving the longest one only if the alignment score was 
more than 1.5 times higher than for the anchors it over- 
lapped with. Regions with several ambiguous overlap- 
ping anchors with similar length and alignment score 
were removed completely. The filtered unique anchors 
were grouped into syntenic blocks with grimm as 
described above. Unaligned regions between syntenic 
blocks were considered as break points. To narrow 
down these regions further, we used cassis (Baudet et al. 
2010), which attempts to find the precise break point 
location by a local realignment strategy. In this way, 
most break points decreased in size; however, for a few 
that CASSIS failed to narrow down, we kept the original 
break point positions. IntersectBed from bedtools (Quin- 
lan & Hall 2010) was used for extracting overlaps with 
known repeats and genes. For all rearrangement analy- 
ses, we only included flycatcher scaffolds that were con- 
fidently both oriented and ordered based on direct 
evidence from linkage or mate-pair data. 

Recombination rate analysis 

We estimated recombination rates in 200-kb windows 
across the flycatcher genome using the updated assem- 
bly version as reference. This was done by calculating 
recombination fractions between all adjacent markers 
in the best-order linkage map and assigning window- 



specific estimates based on the weighted average 
recombination rate for all marker pairs present within, 
or flanking, a window. We calculated each window's 
distance to nearest chromosome end, as well as its gene 
density (proportion of exonic sequence), GC content, 
repeat content separated into the two classes 
'interspersed repeats' and 'microsatellites' (RepeatMas- 
ker; Smit, Hubley, and Green; http://repeatmasker. 
org), and the presence of previously identified 
(CCNCCNTNNCCNC and CCTCCCT; Myers et al. 
2010) and de novo discovered (see below) sequence 
motifs associated with high recombination regions. We 
subsequently omitted all windows spanning a scaffold 
gap in the genome assembly. This resulted in a set of 
4749 windows for which estimates of both recombina- 
tion rate and the listed genomic parameters were avail- 
able. 

The variables were transformed to reduce skewness 
in their distributions; recombination rate was log-trans- 
formed to base 10 after adding a constant of 1 to pre- 
serve zero rate values, chromosome size was log- 
transformed to base 10, distance to chromosome end 
was standardized by chromosome size giving values 
ranging between 0 and 1, and microsatellite density, 
repeat density, motif density, gene density and GC con- 
tent were square-root-transformed. For each parameter, 
we calculated the raw correlation with the recombina- 
tion rate using the Pearson correlation statistic. We sub- 
sequently fitted a multiple linear regression (MLR) 
model using recombination rate as the response vari- 
able to investigate whether the variation could be 
explained by variation in the candidate explanatory 
variables. As an initial step, we investigated the rela- 
tionship among the candidate explanatory variables by 
cluster analysis based on the pairwise correlations. This 
revealed that all of the considered genomic features 
were highly interrelated with each other (Supporting 
information Fig. S2). In particular, microsatellite and 
motif density both correlated strongly with chromo- 
some size (r = —0.45 and —0.43, respectively). As small 
chromosomes only showed limited variation in some of 
explanatory variables, we focused the analysis on chro- 
mosomes larger than 20 Mb. 

Correlations between explanatory variables can create 
biases in regression-like analysis, and inference about 
causal relationships based on MLR analysis thus needs 
to be made carefully. Beside standard MLR and pair- 
wise correlation analysis, we thus performed a principal 
component regression (PGR) analysis using recombina- 
tion rate as the response variable and the six genomic 
features as candidate explanatory variables. All regres- 
sion analyses were performed after Z-transformation of 
the explanatory variables, which means standardization 
of the mean value to 0 and of the standard deviation 
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to 1. We also ran MLR and PCR using GC content as 
response variable as a proxy for the long-term recombi- 
nation rate, motivated by the previously seen correla- 
tion between GC content and recombination in birds 
(Mugal et al, 2013) and the general idea that long-term 
recombination rate influences GC content via biased 
gene conversion (Duret & Arndt 2008). 

In a specific test of the relationship between recombi- 
nation rate and distance to chromosome end, we 
performed a locally weighted polynomial regression 
(lowess regression) with a smoothing parameter value 
of 0.5. Based on the lowess regression, we classified 
'end regions' as regions <5.5 Mb from the chromosome 
end, and 'centric regions' as regions >5.5 Mb from the 
chromosome ends (Supporting information Fig. S3). The 
mean recombination rate was then compared between 
regions. 

To investigate sequence context effects on the recom- 
bination rate, we divided the data into 'hot' and 'cold' 
recombination regions. The hot regions consisted of the 
2.5% (n = 112) marker intervals with highest recombina- 
tion rate and the cold regions consisted of the 2.5% 
(n = 112) marker intervals with lowest recombination 
rate (effectively, the latter translates to the subset of 
marker pairs with the longest physical distance between 
markers without any evidence for recombination in the 
pedigree). We used these categories and searched for 6 
to 10-bp sequence motif enrichment in hot regions 
using the homer2 de novo option in homer 4.2 (Heinz 
et al. 2010). As suggested by Heinz et al. (2010), we 
applied a stringent significance threshold for enrich- 
ment of (10^^") and removed complete redundancies. 

Results 

A high-density linkage map of the collared flycatcher 
genome 

Linkage analysis first mapped 731 markers to unique 
positions on 31 linkage groups in a preframework map 
with the stringent threshold of LOD > 5. The iterative 
addition of markers by pairwise linkage scoring 
between preframework markers and the remaining 
36 531 markers subsequently assigned a total of 33 627 
markers to 34 different linkage groups, including the 
three new linkage groups Fal34-Fal36. We then ordered 
markers within linkage groups, and the resulting frame- 
work map (marker order supported by LOD > 3.0) was 
composed of 2456 ordered markers with a total genetic 
distance of 3256 cM and a mean genetic distance 
between adjacent markers of 1.37 cM (±1.68 SD; 
Table 1). We included additional SNPs in this map by 
stepwise lowering the LOD threshold down to 
LOD > 0.1 (see Supporting information Appendix SI), 



providing a best-order map containing 4302 markers 
and spanning 3256 cM in autosomes and 161 cM in the 
Z chromosome (Table 1, Fig. 1, Supporting information 
Fig. S4). The mean genetic distance between adjacent 
markers in the best-order map was 0.69 cM (±1.10 SD). 
Finally, there were 33 627 markers at this stage that 
were assigned to one of the linkage groups but not 
placed on the best-order map. Of these, 31 867 
unmapped markers were located in scaffolds containing 
best-order markers; therefore, their physical locations 
could be inferred. When these markers were forced to 
be included in the map based on their physical position 
in the respective scaffolds, 76% of markers (24 231) had 
zero genetic distance with already mapped best-order 
markers. The total genetic distance of the forced order 
map was inflated with 13% (3690 cM), likely at least in 
part due to small errors in marker order (Table 1). The 
mean genetic distance between adjacent markers in this 
forced map was 0.12 cM (±0.73 SD). 

A second-generation assembly of the flycatcher genome 

The draft assembly of the collared flycatcher genome 
(version ncALB_1.4; EUegren et al. 2012) lacks unambigu- 
ous information on the order and orientation of scaffolds 
along most chromosomes. For example, as the draft 
assembly was based on a coarse linkage map, scaffold 
ordering had in many cases to be based on indirect 
information from the assumption of conserved synteny 
relative to the zebra finch genome. In addition, 55% of 
the scaffolds remained unanchored to linkage groups/ 
chromosomes. With the aid of the new linkage map, we 
were able to anchor, order and orient scaffolds corre- 
sponding to 95.7% (1.013 Gb) of the final 1.058 Gb 
assigned to chromosomes (Table 2). We then constructed 
super-scaffolds by scanning scaffold ends for mate-pair 
links to all other scaffolds, assigned as well as unas- 
signed, resulting in the incorporation of 43 previously 
unassigned and mostly small scaffolds (mean size of 
68.2 kb, a total of 2.9 Mb) into the assembly. For another 
40 scaffolds (mean size 660 kb, 26.4 Mb in total) that 
had only been indirectly placed in the assembly based 
on information on the location of homologous sequence 
in zebra finch, we could confirm ordering and confirm 
or establish orientation. Finally, and importantly, links 
were established between adjacent scaffolds for 210 of 
the 394 gaps in the assembly (285 of a total of 437 gaps 
after the inclusion of the 43 previously unassigned short 
scaffolds). As the mate-pair libraries from which these 
links were established had insert sizes of 2-20 kb, this 
indicates the maximum size of the gaps. In no case, did 
we find evidence for links between scaffolds that were 
not placed immediately adjacent to each other, strongly 
validating the overall accuracy of the assembly. 
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Table 1 Number of mapped SNP markers and total genetic distance of each chromosomes of the collared flycatcher 



Nximber of markers Framework map (cM) Best-order map (cM) Forced map (cM) 



Framework Best-order Forced 
Chrom. map map map Average Female Male Average Female Male Average Female Male 
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56.0 


50.5 


51.0 
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23 


45 


96 
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41.8 


49.1 


53.2 


44.7 


50.2 


52.6 


47.6 


24 


71 


98 
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50.6 


49.2 


52.0 


50.5 


51.3 


50.2 


54.7 


51.3 


58.6 


25 


12 


22 


60 


46.2 


43.7 


51.0 


47.9 


45.6 


53.0 


50.0 


48.5 


54.0 


26 


48 


81 


950 


46.7 


50.7 


41.6 


46.3 


48.1 


43.2 


88.5 


98.0 


79.6 


27 


42 


73 


534 


74.9 


69.4 


83.6 


73.5 


68.0 


82.2 


85.1 


80.6 


97.7 


28 


24 


39 


188 


48.9 


49.9 


49.2 


48.2 


49.3 


49.5 


66.2 


72.3 


60.4 


LGE22 


16 


32 


38 


52.3 


52.1 


51.6 


53.3 


53.0 


53.6 


49.8 


43.5 


55.8 


Fal34 


2 


94 


10 


12.1 


31.0 


6.9 


16.3 


20.7 


11.9 


15.4 


19.1 


11.5 


Fal35 


5 


8 


8 


36.8 


35.5 


39.3 


37.2 


34.7 


39.6 


36.6 


34.7 


39.0 


Fal36 


2 


5 


5 


2.9 


0 


5.6 


9.7 


6.5 


13.8 


9.7 


6.5 


13.8 


Total* 


2377 


4213 


31124 


3148 


3038 


3278 


3142 


2997 


3300 


3574 


3354 


3846 


Z 


79 


89 


743 


107.7*' 




161.6 


107.5'' 




161.2 


115.9^' 




173.9 


Total* 


2456 


4302 


31 867 


3256 


3038 


3440 


3249 


2997 


3461 


3690 


3354 


4020 



*Autosomes. 

^Sex-average genetic distance for chromosome Z calculated as male genetic distance * 2/3. 
'■Autosomes plus Z chromosome. 



The new assembly (ficalbI.5) has an N50 super-scaf- 
fold size of 20.2 Mb (17.4 Mb if including singleton 
scaffolds) and covers 33 autosomes and the Z chromo- 
some (Table 2, Supporting information Fig. S5). The 
great majority of chromosomes are nearly fully covered 
by 1-5 super-scaffolds, that is not far from continuous 
chromosome sequences; four chromosomes do in fact 
correspond to a single super-scaffold and one to a 
single scaffold (Table 2). The assembly includes 
sequence data for four microchromosomes that are not 
represented by defined chromosomes in the chicken 



assembly (which has sequence data from chromosomes 
1-28 and 32). One of these, including 2.1 Mb of fly- 
catcher sequence and covering a genetic distance of 
53 cM, has sequence homology to chicken linkage 
group LGE22. The other three (Fal34 with 16 cM, Fal35 
with 37 cM and Fal36 with 10 cM) show no sequence 
homology to assembled sequence from the chicken gen- 
ome. The linkage map did not have the same high 
degree of resolution for the Z chromosome as for auto- 
somes due to the fact that only male meioses were 
informative. This led to a higher proportion of scaffolds 
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Chromosome 

LGE Fal Fal Fal 

1 lA 2 3 4 4A 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 22 34 35 36 Z 




Fig. 1 A genetic linkage map of the collared flycatciier genome. The horizontal bars on each chromosome or linkage group represent 
mapped SNP markers based on best-order map. The scale bar to the left shows the lengths of linkage groups as measured in Kosambi 
cM. A more detailed map with marker names is presented as Supporting information Fig. 4. 



that were not ordered and /or oriented, and as such 
scaffolds were not included in our final assembly, this 
likely explains why the flycatcher Z chromosome 
assembly was shorter (59.7 Mb) than that of chicken 
(74.6 Mb) and zebra finch (72.9 Mb). 

In the end, after addition of new scaffolds and scaf- 
fold orientation by linkage data and mate-pair linking, 
the ordered and oriented sequences constitute 98.6% 
(1.042 Gb) of the scaffolds assigned to chromosomes. Of 
the total assembly also including unassigned scaf- 
folds (1.116 Gb), 93.4% of the sequence was anchored, 
ordered and oriented along chromosomes. This repre- 
sents a considerable improvement compared to the pre- 
vious assembly (596 Mb or 56.5% anchored; Table 3). 

Highly conserved DNA content of avian chromosomes 

The high degree of genome coverage coupled with the 
unusual continuity in scaffolded sequence along each 
chromosome gives unprecedented power and resolution 
to study the rate and pattern of chromosomal rearrange- 
ments during avian evolution. We made whole-genome 



alignments of flycatcher, chicken and zebra finch, 
inferred syntenic blocks >50 kb in size and identified 
chromosomal rearrangements. Despite the two lineages 
split ^40 million years (My) ago (Nabholz et al. 2011), 
flycatcher and zebra finch chromosomes are entirely 
syntenic without clear-cut evidence of interchromosomal 
rearrangements (see Supporting information Fig. S6 for 
a comment on the tentative chromosome IB in zebra 
finch), witnessing on the rather extreme karyotypic sta- 
bility of birds. The flycatcher assembly confirms the only 
two clear cases of interchromosomal rearrangement dis- 
tinguishing the chicken and zebra finch karyotypes. 
First, flycatcher and zebra finch have two chromosomes, 
chromosomes 1 (Ril20 Mb) and lA {k^75 Mb), which 
correspond to the single chromosome 1 of chicken 
(201 Mb), a result of a fission in the passeriform lineage. 
Second, chicken chromosome 4 (94 Mb) corresponds in 
both flycatcher and zebra finch to two chromosomes, 
chromosomes 4 (^70 Mb) and 4A (Ri21 Mb), resulting 
from a fusion in the galliform lineage. 

The assembly sizes of individual chromosomes were 
remarkably similar among chicken, zebra finch and 
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Table 2 Number of ordered and oriented scaffolds assigned to each collared flycatcher chromosome in the ficalbI.5 assembly ver- 
sion. Also shown is assembly size of homologous chromosomes of flycatcher, zebra finch and chicken 



Collared flycatcher 









Singleton 


Not oriented 




Zebra finch 


Chicken 


Chromosome 


Scaffolds 


Super-scaffolds 


scaffolds* 


scaffolds 


Size (Mb) 


(Mb) 


(Mb) 


1 


43 


6 


3 


1 


119.8 


119.6+ 


201.0 


lA 


38 


8 


6 


2 


74.8 


73.7 


— 


2 


35 


8 


3 


1 


157.4 


156.4 


154.9 


3 


28 


6 


5 


1 


115.7 


112.6 


113.7 


4 


26 


4 


4 


1 


70.3 


69.8 


94.2 


4A 


5 


2 


1 


1 


21.2 


20.7 


— 


5 


22 


5 


5 


0 


64.6 


62.4 


62.2 


6 


11 


2 


1 


0 


37.2 


36.3 


37.4 


7 


14 


3 


0 


0 


39.3 


39.8 


38.4 


8 


12 


3 


0 


0 


32.0 


28.0 


30.7 


9 


7 


3 


0 


0 


26.8 


27.2 


25.6 


10 


11 


2 


0 




21.3 


20.8 


22.6 


11 


5 


2 


1 


0 


21.7 


21.4 


21.9 


12 


11 


2 


1 


0 


21.9 


21.6 


20.5 


13 


6 


2 


2 


0 


18.6 


17.0 


18.9 


14 


2 


1 


0 


0 


17.4 


16.4 


15.8 


15 


1 


0 


1 


0 


14.9 


14.4 


13.0 


16 


0 


0 


0 


0 


— 


<0.01 


0.43 


17 


4 


2 


0 


0 


12.4 


11.6 


11.2 


18 


13 


1 


1 


1 


13.1 


11.2 


10.9 


19 


6 


2 


0 


0 


11.9 


11.6 


9.9 


20 


8 


2 


1 


0 


15.6 


15.7 


14.0 


21 


5 


1 


2 


1 


8.1 


6.0 


7.0 


22 


8 


3 


1 


3 


5.7 


3.4 


3.9 


23 


5 


1 


0 


0 


7.9 


6.2 


6.0 


24 


4 


\ 


Q 


0 


8.0 


8.0 


6.4 


25 


19 


3 


2 


1 


2.7 


1.3 


2.0 


26 


6 


1 


2 


1 


7.6 


4.9 


5.1 


27 


19 


4 


2 


7 


5.5 


4.6 


4.8 


28 


12 


3 


0 


3 


6.1 


5.0 


4.5 


LGE22t 


10 


3 


3 


4 


2.1 


0.9 


0.9 


Fal34 


4 


1 


1 


4 


0.11^ 






Fal35 


0 


0 


0 


7 


01 






Fal36 


1 


0 


1 


3 


0.18** 






Z 


32 


8 


5 


7 


59.7 


74.6 


72.9 


Total 










1042 


1023 


1031 



*Scaffolds which could not be joined to other scaffolds in the super-scaffolding process. 

+1.1 Mb from the tentative chromosome IB in zebra finch has been added to lA of this species. 

*In the most recent chicken genome assembly (Galgal4), the full name of this linkage group is LGE22C19W28_E50C23. 
^Fal34 has an assembly size of 0.29 Mb when including un-oriented scaffolds. 
^Fal35 has an assembly size of 0.32 Mb when including un-oriented scaffolds. 
**Fal36 has an assembly size of 0.46 Mb when including un-oriented scaffolds. 



flycatcher (Table 2), and did not differ by more than 
2.5 Mb in size for 26 of 29 autosomes. As a conse- 
quence, the total amount of sequence assigned to chro- 
mosomes was nearly identical in the three bird species 
(1.02-1.04 Gb), again testifying on an overall evolution- 
ary stasis of avian chromosomes. We note that, exactly 
like for chicken and zebra finch, flycatcher chromosome 
16 was difficult to sequence and assemble. The reason 



for the difficulty in sequencing and /or assembling this 
avian chromosome is to our knowledge not known. 

Frequent intrachromosomal rearrangements during 
avian evolution 

The evolutionary stability in the size and content of 
avian chromosomes stands in sharp contrast to frequent 
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Table 3 Summary assembly statistics for the second-generation 
assembly version of the collared flycatcher genome (ficalbI.5; 
present study) and the previous ficalb_1.4 version (EUegren 
ef al. 2012) 



FICALB 1.4 



ficalbI.5 



# Scaffolds Size (Mb) # Scaffolds Size (Mb) 



Ordered 67 596 

and 

oriented 

Ordered 67 224 

Inferred 164 182 

Unassigned 21 467 (109) 114 (73) 

Total 21 765 (404) 1116 (1076) 



441 



46 
2 

21 354 (73) 
21 843 (451) 



1042 



15 

0 

59 (23) 
1116 (1075) 



All data from scaffolds >200 bp are included, which explains 
the large number of unassigned scaffolds and the large total 
number of scaffolds. Data for scaffolds >100 kb are shown in 
parentheses; note that excluding scaffolds <100 kb has little 
influence on total assembly size. 'Inferred' means scaffolds 
indirectly assigned to chromosomes based on conserved synte- 
ny with zebra finch. 



Table 4 Number of intrachromosomal rearrangements (inver- 
sions) per chromosome in three avian lineages detected with a 
resolution of 50-kb synteny blocks 



Chromosome 


Chicken 


Zebra finch 


Flycatcher 


1 + lA 


29 


11 


18 


2 


9 


10 


6 


3 


18 


6 


4 


4 + 4A 


33 


5 


2 


5 


11 


3 


3 


6 


8 


1 


0 


7 


4 


5 


1 


8 


6 


1 


1 


9 


5 


1 


0 


10 


2 


0 


1 


11 


8 


4 


0 


12 


6 


1 


0 


13 


3 


0 


0 


14 


6 


1 


1 


15 


4 


1 


0 


17 


0 


1 


0 


18 


4 


2 


0 


19 


2 


1 


0 


20 


5 


1 


2 


21 


4 


2 


0 


22 


3 


0 


0 


23 


5 


2 


0 


24 


5 


2 


0 


25 


0 


0 


4 


26 


4 


2 


3 


27 


2 


2 


3 


28 


7 


2 


1 


Z 


10 


12 


6 


Total 


203 


79 


61 



changes in the genomic organization within chromo- 
somes (intrachromosomal rearrangements). We found a 
total of 343 inversions, which can explain the current 
organization of chromosomal segments in chicken, 
zebra finch and flycatcher (Table 4). As expected, most 
of these (203) can be traced back to the long lineage 
connecting chicken and the common ancestor of fly- 
catcher and zebra finch in an unrooted tree. The num- 
ber of rearrangements in the flycatcher and zebra finch 
lineages was 61 and 79, respectively, and can be readily 
seen in circular visualization of sequence homologies 
between the two species (Fig. 2). Based on these num- 
bers, we estimate the rate of inversion at L5 (flycatcher 
lineage), 2.0 (zebra finch) and 1.7 (chicken-passeriform 
ancestor) events per My. This corresponds to rates of 
0.0014-0.0019/My/Mb. We note that all three genome 
assemblies used for making this inference are based on 
genetic linkage data. 

The size of inversions was biased towards the lower 
end of detectable events (Supporting information 
Fig. S7), with median size of 3.34 Mb (chicken), 2.62 Mb 
(zebra finch) and 0.78 Mb (flycatcher). With the propen- 
sity for inversions to be short, it was clear that many 
events would have been missed with lower assembly 
continuity and at higher thresholds for minimum size of 
syntenic blocks. This was confirmed when we increased 
block size to 100, 250 kb or 1 Mb to make inference 
about the number of rearrangements (Supporting infor- 
mation Table SI); the total number of rearrangements 
decreased from 343 at the resolution of 50 kb to 87 at 
the resolution of 1 Mb. Accordingly, the estimated rates 
of inversion decreased from 1.5-2.0 to 0.2-0.5/My. This 
highlights the importance of the level of resolution for 
characterization of chromosome rearrangements. 

Another way of quantifying the inversion rate is to 
also take the amount of inverted sequence into account. 
The total length of all inversions was 476.1 Mb (49.1% 
of the aligned sequence), 299.2 Mb (30.2%) and 
265.4 Mb (26.3%) in the chicken-passeriform ancestor, 
zebra finch and flycatcher lineage, respectively. This 
gives inversion rates of 4.0, 7.5 and 6.6 Mb inverted 
DNA per My, respectively. Note that these numbers are 
based on the amount of unique sequence involved in 
rearrangements; the sequence of nested inversions was 
only considered once. Also note that the rate estimates 
cannot be expected to increase linearly with time as, 
with a constant rate of rearrangement, the amount of 
sequence not yet inverted will decrease over time. This 
may explain the lower rate estimate for the long 
chicken-passeriform ancestor branch. 

We next examined chromosomal break points and 
sought to elucidate their characteristics. Here we used 
pairwise alignments between flycatcher and zebra finch 
to obtain higher resolution (due to the shorter 
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evolutionary distance). One hundred sixty-five break 
point regions were identified, with a median size of 
2.4 kb for measurable regions (see below; Supporting 



information Fig. S8). Of these, 28 regions were reused 
twice (17%). There was a very strong association 
between the location of scaffold junctions in the fly- 
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catcher assembly and chromosomal break points. If the 
165 break points would have been randomly distributed 
in the genome, we should have expected to find <1 to 
coincide with the location of scaffold junctions. How- 
ever, we observed 42 scaffold junctions inside break 
points, clearly showing that some regions of the gen- 
ome are both resistant to sequence assembly and prone 
to chromosomal mutation. 

The distribution of break points across the genome 
was nonrandom with clusters of multiple inversion 
events interspersed with large chromosomal regions of 
structural stasis (Fig. 3). There was a propensity for 
break points to be located towards the ends of chromo- 
somes, with a significant deviation from a uniform dis- 
tribution along chromosomes (Supporting information 
Fig. S9; goodness-of-fit test, chi-square = 22.46, d.f. = 9, 
P < 0.05). Moreover, there was a negative correlation 
between chromosome size and the rate of inversion per 
Mb (Wilcoxon's test, z = 6.06, P < 0.001). Furthermore, 
several genomic parameters differed significantly 
between break point regions and the rest of genome, 
including recombination rate (mean 5.83 vs. 3.25 cM/ 
Mb, z = 5.74, P = 4.8 X 10"'), GC content (mean 0.513 
vs. 0.416, z = 11.28, P = 1.6 X lO"^'') and repeat density 
(mean 0.221 vs. 0.096, z = 4.79, P = 8.4 x 10"^). 

Recombination rate variation 

With a high-density linkage map and a genome assem- 
bly with a high degree of sequence continuity along 
chromosomes, it is possible to obtain detailed recombi- 



nation rate estimates across the flycatcher genome. We 
divided the genome into 200-kb windows and obser- 
ved a mean sex-averaged recombination rate of 
3.1 ± 4.1 cM/Mb across windows. The genomic land- 
scape of recombination was highly heterogeneous, with 
two major, large-scale trends of recombination rate vari- 
ation. First, the mean recombination rate was consider- 
ably higher for small chromosomes than for large 
chromosomes (Table 5, Fig. 4). The rate was in excess 
of 10 cM/Mb for chromosomes <10 Mb; for the new 
linkage group Fal35, with only 230 kb of assembled 
sequence (and a genetic distance of 36.8 cM), recombi- 
nation reached an extreme estimated rate of 160 cM/ 
Mb. For the three chromosomes >100 Mb, the rate was 
uniformly ^^2.0 cM/Mb, while for chromosome size 
classes in the range of 10-100 Mb, recombination rate 
was intermediate and increased with decreasing chro- 
mosome size. 

It is interesting to note that the effect of chromosome 
size on rate of recombination gradually diminished 
with increasing chromosome size. In fact, if subtracting 
50 cM from the length of each linkage group (reflecting 
one obligate interchromatid crossing-over per chromo- 
some, see Discussion) before calculating the chromo- 
some-average recombination rate as map length divided 
by physical size, the rate of recombination seemed 
largely independent of chromosome size (Table 5) and 
increased with 1.5-2.0 cM for every Mb of increased 
physical size. In addition, it could also be noted from 
Fig. 4 that the mean recombination rate of the Z chro- 
mosome (2.7 cM/Mb over 60 Mb) as measured in male 
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Fig. 3 Genomic distribution of break point regions. Resolution is 50 kb synteny blocks in pairwise whole-genome alignments of 
flycatcher and zebra finch. 
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Table 5 Mean (and standard deviation, SD) sex-average recombination rate per chromosome for different autosomal size categories. 
Also shown are recombination rates after 50 cM (corresponding to one obligate recombination event per chromosome) has been sub- 
tracted from the genetic length of each chromosome 



Recombination rate 



Chromosomal 

size category 


n 


Recombination rate 
(cM/Mb) 




(cM/Mb) after 
subtracting 50 cM 




Mean 


SD 


Mean 


SD 


>100 Mb 


3 


2.0 


0.0 


1.6 


0.1 


50-100 Mb 


3 


2.7 


0.3 


2.0 


0.3 


25-50 Mb 


4 


3.2 


0.2 


1.7 


0.2 


10-25 Mb 


11 


4.5 


0.9 


1.5 


0.7 


<10 Mb 


9 


11.1 


6.9 







*Most of these chromosomes have a genetic distance of <50 cM. 
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Fig. 4 The relationship (a) between the 
genetic distance (cM) and the chromo- 
some size (Mb) and (b) between recombi- 
nation rate (cM/Mb) and the chromosome 
size (Mb) in the best-order map of the 
collared flycatcher genome. Open and 
filled circles are the Z chromosome and 
autosomes, respectively. 



meiosis was very similar to that of similarly sized auto- 
somes (chromosomes lA, 4 and 5, sized 65-75 Mb, have 
a mean recombination rate of 2.7 cM/Mb). 

Second, there was a significant increase in recombina- 
tion rate towards chromosome ends, a pattern consis- 
tent irrespective of chromosome size (Fig. 5). For 
instance, the average recombination rate in the ends, 
defined as the distal 5.5 Mb of each chromosome end 
(see Material and methods and Supporting information 
Fig. S3 for motivation) of chromosomes 1-6 was 
5.7 cM/Mb, while the internal regions of these chromo- 
somes had a mean rate of 2.3 cM/Mb. Similarly, the 
average recombination rate at chromosome ends and 
interior regions of smaller chromosomes 7-28 was 7.0 
and 2.4 cM/Mb, respectively. 

The total map length was on average 10% longer in 
males than females (3300 and 2997 cM in the best-order 
autosomal map, respectively; Wilcoxon's test for 
matched pair of windows, V = 104, P = 0.002). There 
was limited regional variation in sex-specific recombina- 
tion rates (Supporting information Fig. SIO); however, 
chromosomes 17 and 27 made exceptions by showing 
marked differences between the sexes in 2-3 Mb 
regions (Fig. 6; Table 1). To test whether these differ- 
ences were repeatable, we divided the pedigree into six 



subsets of individuals (n = 100 each) and estimated 
sex-specific recombination rates in each subset. In chro- 
mosome 27, all six subsets showed larger total genetic 
distance in males (total genetic distance = 34.2-76.4 cM 
in females and 81.0-129.0 cM in males), and in chromo- 
some 17, five of six subsets showed larger total genetic 
distance in males (total genetic distance = 59.1-85.5 cM 
in females and 76.4-89.9 cM in males). This suggests 
that there is a true signal of sex differences in recombi- 
nation rate in these chromosomes. 

In order to search for sequence motifs potentially 
associated with high recombination rates, we partitioned 
the rate between all marker pairs into two extreme clas- 
ses, representing the regions with the 2.5% highest ('hot 
regions') and 2.5% lowest ('cold regions') rates. The pre- 
viously described (Myers et al. 2008; Winckler et al. 
2005) sequence motifs CCNCCNTNNCCNC and 
CCTCCCT associated with high recombination were 
both present at higher density (2.1 and 1.6 times, respec- 
tively) in the hot regions than in the cold regions, 
although this was not statistically significant. We also 
searched for enrichment of previously unidentified 
sequence motifs in hot regions, and, after correcting for 
redundancy and multiple testing, we found evidence for 
enrichment of six different sequence motifs 6-9 bp long: 



© 2014 The Authors. Molecular Ecology PubUshed by John WUey & Sons Ltd. 



GENOME ASSEMBLY AND GENETIC MAP OF FLYCATCHER 4047 



(a) 




0 20 40 60 80 

Distance from chromosome end (Mb) 



(b) 




0 2 4 6 8 10 

Distance from ctiromosome end (IVlb) 



Fig. 5 Scatter plot showing the recombi- 
nation rate as a function of distance to 
chromosome end for (a) chromosomes 
>20 Mb and (b) chromsomes <20 Mb. 




Fig. 6 Sex-specific relationships between 
the genetic (cM) and the physical (Mb) 
distance in the best-order map of col- 
lared flycatcher chromosomes 17 (a) and 
27 (b). Red and blue circles indicate the 
female- and male-specific genetic map, 
respectively. 
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GATGAGATG, AATCAATC, GAAGGAGA, CCATATC, 
GGATCC and TCGAGG (Supporting information 
Table S2). 

Several genomic parameters have previously been 
shown to be associated with recombination rate varia- 
tion in other organisms (Coop & Przeworski 2007; Web- 
ster & Hurst 2012; Cutter & Payseur 2013). Focusing on 
chromosomes >20 Mb, we found significant pairwise 
correlations between recombination rate and chromo- 
some size, distance to chromosome end (as shown for 
all chromosomes, described above), microsatellite den- 
sity, sequence motif density and gene density (Table 6). 
To disentangle the relative effect of each of these 
parameters, we performed MLR analysis and PCR 
analysis using recombination rate as response variable. 
This showed that microsatellite density, motif density 
and distance to chromosome end explained most of the 
variation in recombination rate, while the impact of 
chromosome size, gene density and interspersed repeat 
density was of minor importance (Table 6). The relative 
limited effect of chromosome size was probably related 
to the fact that we only analysed chromosomes >20 Mb 
(see Discussion). The PCR further allowed us to disen- 
tangle two independent effects (PC I and PC II), which 
contributed separately to the variation in recombination 



rate (Table 6, Fig. 7). Distance to chromosome end clus- 
tered together with microsatellite and motif density in 
PC 1. Chromosome size built the main contribution to 
PC II, which points towards an independent effect of 
chromosome size on the recombination rate. 

Recombination rate conservation 

A broad-scale overview of the recombination landscape 
in flycatcher compared to zebra finch and chicken is 
given in Fig. 8, which depicts the relationship between 
physical position and cumulative genetic map length 
for each chromosome. Clearly, the flycatcher landscape 
is more similar to that in chicken than to that in zebra 
finch. Although all three species show an increased 
recombination rate towards chromosome ends, this 
trend is much more pronounced in the zebra finch than 
in the other two species. This difference is reinforced by 
the very low rate of recombination in the interior 
regions of zebra finch chromosomes. 

As recombination impacts the patterns of local base 
composition in avian genomes via GC-biased gene 
conversion (Mugal et al. 2013), GC content might be a 
good indicator of long-term global recombination rate 
variation. A correlation between GC content and 
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Table 6 Estimates (Est) and statistical significance (P-value) of multi-linear regression (MLR) analysis for six candidate explanatory 
variables of variation in recombination rate and GC content, respectively 



Recombination GC content 



Parameter 


Est 


P-value 




PGR (%) 


Est 


P-value 




PGR (%) 


Distance to end 


-0.069 


<10"^^ 


-0.37 


5.37 


-0.048 


<io-^-' 


-0.42 


12.21 


Chromosome size 


-0.036 


<10"^^ 


-0.21 


1.89 


-0.037 


<io-^-' 


-0.37 


9.34 


Microsatellites 


0.002 


0.67 


0.23 


5.75 


0.107 


<io-^-' 


0.58 


13.21 


Repeat density 


-0.001 


0.81 


0.01 


2.60 


-0.059 


<io-^-' 


-0.11 


10.24 


Motif density 


0.061 


<io-^= 


0.38 


5.78 


0.133 


<io-^-' 


0.65 


15.40 


Gene density 


0.022 


9.010"'' 


0.09 


0.78 


0.073 


<io-^-' 


0.29 


5.52 


Total 








22.2 








65.5 



Included in the table is also the raw Pearson's pairwise correlation coefficient (r^) between each explanatory variable and recombina- 
tion rate, and the amount of variation explained by each explanatory variable according to the principal component regression 
(PGR). Data are from chromosomes >20 Mb. 
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Fig. 7 Barplot showing the fractions 
explained by each parameter in the prin- 
cipal component regression analysis with 
(a) recombination and (b) GG content as 
response variables. In each analysis, the 
proportional contribution of each explan- 
atory variable was very similar, but the 
total amount of the variance explained 
was higher when using the GG content 
as a response variable (notice the differ- 
ence in scaling of the y-axis). 



current recombination rate might thus be indicative of 
long-term conservation in recombination rate varia- 
tion across the genome. This was indeed observed 
(r = 0.47, P < 10""; Supporting information Fig. Sll). 
In the Ught of this, we repeated the calculations of 
pairwise correlation, MLR and PGR using GC content 
as the response variable. MLR showed that motif den- 
sity and microsatellite density explained most of the 
variation in GG content, followed by distance to chro- 
mosome end and chromosome size (Table 6). PGR 
showed that variation in GG content was explained by 
two major principal components, PG I was composed 
primarily of motif density, microsatellite density and 
distance to chromosome end while PG II was com- 
posed of chromosome size, repeat density and gene 
density (Table 6). This trend was thus consistent with 
the variation in pedigree-based recombination rate esti- 
mates. In fact, when recombination rate and GG 
content were used as a combined response variable, 
more than a half of the variation was explained by 
these variables. 



Discussion 

We have capitalized on the power of contemporary 
DNA sequencing technology to develop a high-resolu- 
tion genetic map of the collared flycatcher genome. This 
allowed the construction of an improved genome 
assembly and downstream analyses of recombination 
rate variation and chromosomal evolution at high 
resolution. High-throughput sequencing was critical in 
the process of map construction for at least two rea- 
sons. First, markers for the map were well distributed 
across the genome. This owes to the fact that we had a 
draft assembly of the flycatcher genome, constructed 
using high-throughput sequencing but without physical 
mapping tools such as BAG or fosmid clones, from 
which suitably distributed markers could be selected. 
Second, the availability of polymorphism data from 
whole-genome resequencing of population samples 
meant that we could select highly variable markers. We 
also capitalized on new technology for the development 
of a genetic map with unusually high marker density in 
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Fig. 8 The relationship between tlie genetic (cM) and the pliysical (Mb) distance in the best-order map of the collared flycatcher gen- 
ome (blue circle, this study), zebra finch (orange square, Backstrom et al. 2010a) and chicken (black cross, Groenen et al. 2009). For 
comparison purposes, chromosomes 1 and 4 in chicken were split into two chromosomes, corresponding, respectively, to chromo- 
somes 1 and lA, and chromosomes 4 and 4A, based on the zebra finch genome. Data were not available for chromosomes 2, 21, 22, 
24, 26, 27 and 28 in zebra finch. 



a nonmodel organism by performing array-based 
SNP genotyping using a custom 50 K SNP array, pur- 
posely developed for this endeavour (Kawakami et al. 
2014). 

By integrating high-density linkage map data with 
scaffold sequences from the draft genome assembly, we 
obtained a significantly improved assembly of the col- 
lared flycatcher genome. The assembly has 98.5% of the 
anchored sequence ordered and oriented along chromo- 
somes and a super-scaffold N50 size of 20.2 Mb. It 



covers 33 autosomes and the Z chromosome, which 
compares well with the two Sanger-sequenced avian 
genomes [chicken: 28 autosomes with >0.1 Mb of 
assembled sequence (ICGSC 2004); zebra finch: 31 auto- 
somes with >0.1 Mb of assembled sequence (Warren 
et al. 2010)]. 

The karyotype of collared flycatcher has not been 
characterized. For 25 other bird species of the order 
Passeriformes, chromosome number is in the range of 
2n = 72-84, with 19 species showing 2n = 78-80 (which 
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is also the most common number across the whole class 
of Aves; Gregory 2011). This could suggest that there 
are at most 5-6 small chromosomes for which we still 
have not anchored scaffolds to linkage groups. Com- 
pared to a random process of marker selection for geno- 
typing, an informed strategy of using SNPs from the 
vast majority of all scaffolds was of obvious benefit for 
linkage-based scaffold ordering and orientation. 
Together, this illustrates that it is feasible to obtain a de 
novo assembly of a vertebrate genome with nearly con- 
tinuous chromosome sequences, without additional 
genomic resources or molecular tools. The latter repre- 
sents the default situation for essentially all nonmodel 
organisms. 

A 'core' avian genome 

It is known that birds show less variation in genome 
size than other amniote lineages (Griffin et al. 2007; EUe- 
gren 2010). However, it is remarkable that with fly- 
catcher now added to the avian genomes so far 
sequenced at high sequence continuity, all have assem- 
blies of pal.lO Gb, with 1.02-1.04 Gb assigned to chro- 
mosomes (ICGSC 2004; Dalloul et al. 2010; Warren et al. 
2010; Huang et al. 2013; Shapiro et al. 2013). Moreover, 
the amount of sequence assigned to syntenic chromo- 
somes showed very limited variation among species 
(Table 2). This indicates that the overall DNA content 
of birds is highly conserved across divergent lineages, 
although there may be occasional genome size expan- 
sions from increased transposon activity in certain 
lineages (Organ & Edwards 2011). Cytometric estimates 
of total DNA content of birds vary more, between 1 
and 2 pg and with the majority in the range 1.2-1.5 pg 
(1 pg ^ 0.98 Gb; Gregory 2011). However, these esti- 
mates have been obtained by several different methods 
and are sensitive to calibration, experimental error and 
gender. For chicken, recent estimates tend to converge 
at 1.20-1.25 pg (Mendonca et al. 2010). It thus remains 
to be seen from other species how much genome size 
actually varies across birds; it may very well be that the 
variation is even more limited than previously indicated 
by cytometry. 

The rate of chromosomal evolution in birds 

It is clear that the avian karyotype has remained largely 
stable during the evolution of modern birds (Griffin 
et al. 2007; EUegren 2010), which is in sharp contrast to 
frequent interchromosomal rearrangements occurring 
during, for example, mammalian evolution (Murphy 
et al. 2005). However, it is less clear whether the rate of 
intrachromosomal rearrangements also varies among 
vertebrate lineages and whether avian chromosomes are 



slowly evolving also in this respect. Quantitative analy- 
ses have largely been lacking, and comparisons among 
taxa are sensitive to methodology and resolution. Our 
data demonstrate that the rate of inversion in the 
sampled avian lineages (1.5-2.0 inversion per My) is 
similar to many mammalian lineages analysed with the 
same algorithms and resolution (Supporting informa- 
tion Table S3). In fact, if one takes into account that the 
DNA content of avian genomes is generally <50% of 
that of mammalian genomes, the rate of inversions per 
Mb is higher in the sampled avian lineages than in 
many mammals, like primates (Zhao & Bourque 2009). 
Thus, a stable avian karyotype does not translate 
into an overall stability of the organization within bird 
chromosomes. 

Variation in the rate of inversion from 1.5 events per 
My (flycatcher lineage) to 2.0 events per My (zebra 
finch) gives some indication that there is rate variation 
among avian lineages for intrachromosomal rearrange- 
ments, just as there is substitution rate variation; for 
example, for the two avian orders in focus here, the 
substitution rate in Passeriformes is higher than in Gall- 
iformes (Nam et al. 2010; Nabholz et al. 2011). There are 
rare examples of avian species /families with unusually 
small (2m = 40-50) or large (2m = 130-140) number of 
chromosomes (Gregory 2011), and it will be interesting 
to see whether the dynamic karyotype evolution (with 
fusions and fissions) in these lineages is associated with 
a high rate of intrachromosomal rearrangements. 

Birds have less repetitive DNA than other amniotes, 
with a repeat content of the avian genomes so far 
sequenced of (ICGSC 2004; Dalloul et al. 2010; 

Warren et al. 2010; Ellegren et al. 2012). It has been 
tempting to associate the karyotypic stability of birds 
with the low repeat content under the scenario that 
fewer (transposable) repeats provide less opportunity 
for nonallelic homologous recombination (Burt et al. 
1999). However, why then would the rate of inversion 
be at least as high in birds as in repeat-rich mammalian 
genomes? One explanation could be that the role of 
repeats in mediating chromosomal mutations differs 
between inversions and interchromosomal rearrange- 
ments, such as translocations or fusions /fissions. How- 
ever, there is strong evidence for the involvement of 
transposable elements in generating inversions, consis- 
tent with our observation of increased repeat density in 
avian intrachromosomal break points (Kidd et al. 2008; 
Lee et al. 2008; Zhao & Bourque 2009). Moreover, repeti- 
tive sequences such as gene duplicates, gene clusters or 
other forms of segmental duplications (Armengol et al. 
2003; Bailey et al. 2004; Zhao & Bourque 2009) are fre- 
quently found at sites of break points, including in 
birds (Dalloul et al. 2010; VoUcer et al. 2010). This sug- 
gests that karyotypic evolution is unrelated to repeat 
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density and that the conserved chromosome structure 
so characteristic for birds owes to other factors. Perhaps 
the mechanisms of chromosome rephcation, recombina- 
tion or segregation at avian meiosis are less prone to 
interchromosomal rearrangements in the first place. 
Alternatively, the negative fitness effects of such muta- 
tions could be more severe than in other vertebrate lin- 
eages, meaning that they are to a larger extent removed 
by selection in birds. 

It is interesting to note that despite a stable karyo- 
type, there has been a dynamic process of sequences 
changing their relative position within chromosomes 
during avian evolution. For example, since the split 
of flycatcher and zebra finch lineages 40 My ago, 25- 
30% of all sequence has been repositioned by inver- 
sions. This provides an unusual opportunity to com- 
pare molecular evolutionary parameters between 
sequences that have remained in the same chromo- 
somal position for a long time and sequences that 
have become integrated into another context of the 
genomic landscape, yet remaining on the same chro- 
mosome. This includes aspects such as the evolution 
of base composition, substitution rates and recombina- 
tion rates. 

Characteristics of chromosomal break points 

Previous work has suggested that chromosomal break 
points are reused during evolution, representing hot 
spot regions for chromosome instability (Pevzner & Tes- 
ler 2003; Larkin et al. 2009; Skinner & Griffin 2012). 
However, the case for evolutionary reuse of break 
points is an issue of discussion (Sankoff & Trinh 2005; 
Peng et al. 2006; Alekseyev & Pevzner 2007) and may in 
the end be a matter of resolution (Becker & Lenhard 
2007; Larkin et al. 2009; Attie et al. 2011). Still, our 
results demonstrate a concentration of break points to 
certain regions of the avian genome with 17% of break 
point regions being reused. Observations of indepen- 
dently occurring rearrangements at approximately the 
same chromosomal position in different avian lineages 
have generally been made with much lower resolution 
than applied herein (Griffin et al. 2007; Kemkemer et al. 
2009; Dalloul et al. 2010; Volker et al. 2010; Skinner & 
Griffin 2012), and cannot unambiguously distinguish 
between reuse of sites or regions. Our data point at the 
former as the observed break points were generally 
small (median size 2.4 kb). 

Molecular evolutionary analyses often reveal that sev- 
eral genomic parameters are interrelated. Similarly, we 
found several parameters to correlate with the location 
of chromosomal break points in the flycatcher-zebra 
finch comparison, including recombination rate. 



distance to chromosome end, chromosome size, repeat 
density and GC content. Although it is difficult to dis- 
sect the causal relationships between these correlations, 
we note that recombination events as well as chromo- 
some rearrangements are initiated by the formation of 
double-strand breaks (DSBs; Baudat et al. 2013). An 
association between recombination rate and chromo- 
somal break points has been independently demon- 
strated in a comparison of the chicken and turkey 
genomes (Volker et al. 2010). 

The role of chromosome rearrangements during avian 
evolution 

Chromosomal speciation models posit that rearrange- 
ments distinguishing diverging populations will pro- 
mote speciation via underdominance (due to fitness 
reduction of unbalanced gametes in heterozygotes; 
White 1973; Rieseberg 2001) or by reducing interspe- 
cific recombination in the rearranged regions hindering 
gene flow and facilitating the build-up of genetic 
incompatibilities (Noor et al. 2001; Navarro & Barton 
2003; Coyne & Orr 2004; Kirkpatrick & Barton 2006). 
Genomic rearrangements may also be important for 
local adaptation by bringing functionally related loci 
together (Yeaman 2013). Empirical evidence that sup- 
pressed recombination in regions of inversions is asso- 
ciated with speciation is accumulating in both animals 
and plants (Hoffmann & Rieseberg 2008; Nachman & 
Payseur 2012). However, there are so far only few, if 
any, well-documented examples of inversions contrib- 
uting to speciation in birds. On the other hand, inver- 
sion polymorphisms associated with distinct 
phenotypic differences have been detected, like a 
nearly 100 Mb inversion in one of the macrochromo- 
somes of the white-throated sparrow (Zonotrichia albi- 
collis) associated with a suite of traits including 
behavioural phenotypes (Thorneycroft 1966; Thomas 
et al. 2008). 

Diversification rates are likely to differ over time, and 
estimating speciation rates is notoriously difficult, even 
for the most recent divergences. Speciation durations in 
both mammals and birds may entail at least 2 million 
years on average (A vise et al. 1998; Coyne & Orr 2004), 
and the loss of hybrid fertility in birds may be of the 
order of millions of years (Price & Bouvier 2002; Fitzpa- 
trick 2004). We thus note that the observed rate of intra- 
chromosomal rearrangements (1.5-2.0/My) has been 
sufficiently high for inversions to potentially play a sig- 
nificant role in the build-up of reproductive incompati- 
bility in birds. We encourage further research on the 
genetics of speciation in birds that specifically seek to 
address this question. 
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Recombination rates in the flycatcher genome 

The considerable variation in chromosome size in avian 
genomes is associated with systematic variation in 
chromosome-specific recombination rates: recombina- 
tion rate shovi^s a clear increase with decreasing chro- 
mosome size (ICGSC 2004; Stapley et al. 2008; 
Backstrom et al. 2010a). There is evidence from several 
organisms of one obligate crossing-over per chromo- 
some, often thought to be necessary for proper segrega- 
tion of chromosomes at meiosis (Fledel-Alon et al. 2009; 
Wang et al. 2012). The observation that the intercept of 
a linear correlation between flycatcher chromosome size 
and genetic length was at ^50 cM (Fig. 4a) shows that 
a genetic distance of 50 cM applies regardless of the 
size of chromosomes and leads to very high rates of 
recombination per physical unit of DNA. This is 
entirely consistent with our observations: both MLR 
and PGR showed that chromosome size had a strong 
impact on the rate of recombination. When 50 cM was 
subtracted from the genetic length of each chromosome 
(reflecting the genetic length accrued by one crossing- 
over), we found that recombination rate was nearly con- 
stant across chromosomes and thus independent of 
chromosome size. This would suggest that the number 
of additional recombination events per chromosome 
solely reflects variation in chromosome size and need 
not be related to inherent differences among chromo- 
somes in the rate of recombination per physical unit of 
DNA. 

Many species across different groups of organisms 
are heterochiasmic; that is, they show genome-wide dif- 
ferences in the sex-specific rates of recombination (Burt 
et al. 1991; Lenormand 2003; Lenormand & Dutheil 
2005). The observation of on average 10% higher recom- 
bination in flycatcher males than in females is in line 
with the idea that suppressed sex chromosome recombi- 
nation in the heterogametic sex somehow 'spill over' on 
autosomes, to reduce the genome-wide rate of recombi- 
nation in that sex (Burt et al. 1991). However, there are 
exceptions to this and there are also several alternative 
explanations to why sex differences in recombination 
evolve (Otto & Lenormand 2002; Lenormand 2003; 
Hansson et al. 2005). One interesting possibility is that 
epistatic interactions between loci can favour the spread 
of sexually antagonistic alleles when recombination dif- 
fers between males and females (Mank 2009; Connallon 
& Clark 2010; Wyman & Wyman 2013). Under this sce- 
nario, one could potentially expect localized regions 
with pronounced sex differences in the rate of recombi- 
nation, such as on flycatcher chromosomes 17 and 27 in 
collared flycatcher and as chromosomes 9 and 19 in 
hvmians (Kong et al. 2010), as candidate regions under 
sexually antagonistic selection. 



The recombination rate of the Z chromosome 
(2.7 cM/Mb) was essentially identical to that of simi- 
larly sized autosomes. Birds have female heterogamety 
(males ZZ, females ZW), so the Z chromosome does not 
recombine in females, with exception of the pseudoaut- 
osomal region, and the estimated rate comes from male 
meiosis only. The effective recombination rate of the 
Z chromosome is thus 2/3 x 2.7 =1.8 cM/Mb (not 
Vi X 2.7 as two of the three potentially transmitted Z 
chromosome per breeding pair will recombine, Lohmu- 
eller et al. 2010). Moreover, this is independent of any 
difference in the effective population size of males and 
females, and of the female-to-male breeding ratio. Sex 
chromosomes are often considered as hot spots for spe- 
ciation, that is the large-X effect (Coyne & Orr 2004) or 
Coyne's rule (Turelli & Moyle 2007). One of several 
explanations for this is that the rate of recombination of 
the X/Z chromosome is lower than of autosomes due 
to reduced recombination in the heterogametic sex, 
thereby facilitating the maintenance of combinations of 
diverged gene variants (see Qvarnstrom & Bailey 2008). 
However, our quantitative analysis shows that the effec- 
tive recombination rate of the Z chromosome is not 
much different from the sex-averaged rate of the three 
largest chromosomes (2.0 cM/Mb), which encompass 
more than 35% of the flycatcher genome. If generally 
applicable, this would suggest that the large-X effect 
mainly attributes to other factors, such as dominance 
(Coyne & Orr 2004). 

The unusual heterogeneity in the rate of recombina- 
tion in avian genomes, in particular the high rate of 
recombination in microchromosomes, will impact on 
several aspects of molecular ecological and molecular 
evolutionary analyses. For example, higher marker den- 
sities will be required for detection of linkage in regions 
with high recombination rate in QTL mapping and 
genome-wide association studies. However, when this 
is done, causative loci are likely to be in closer physical 
vicinity to assayed markers in those high recombination 
rate regions than in low recombination rate regions. 
Another aspect is that a heterogeneous recombination 
landscape can provide increased power in detecting 
correlations between the rate of recombination and 
genomic parameters potentially associated with recom- 
bination. For example, recombination rate is expected to 
correlate with both nucleotide diversity and the rate of 
protein evolution, in the latter case related to the effi- 
cacy of selection (Webster & Hurst 2012). Much focus is 
currently put on the question whether Hill-Robertson 
interference - the counteracting effect on genetic varia- 
tion at linked sites by selection - is mainly caused by 
selective sweeps for advantageous alleles or back- 
ground selection against slightly deleterious mutations 



© 2014 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd. 



GENOME ASSEMBLY AND GENETIC MAP OF FLYCATCHER 4053 



(Campos et ah 2014). Avian genomic data may be useful 
in resolving this issue, by comparing sequence evolu- 
tion in regions with markedly different recombination 
rates. 

Conservation of rates and patterns of recombination 

Comparative studies provide evidence for a phyloge- 
netic signal in recombination rate variation among spe- 
cies (i.e. conservation of recombination rates; Dumont & 
Payseur 2008, 2011; Smukowski & Noor 2011; Segura 
et al. 2013). In line with this, we previously found that 
the rate of recombination in orthologous regions of the 
chicken and zebra finch genomes was correlated (Backs- 
trom et al. 2010a). The strong correlation between GC 
content and flycatcher recombination rate seen in the 
present study is consistent with long-term conservation 
of the recombination landscape in birds, with 
GC-biased gene conversion driving GC content in 
regions of high recombination. However, there is also 
evidence from other studies that the total amount of 
recombination can vary among related species, or even 
subspecies (Dumont et al. 2011). For example, the length 
of the human genetic map is more than two times 
longer than that of mouse and rat (Jacob et al. 1995; Die- 
trich et al. 1996; Cox et al. 2009), although genome size 
is only 10% larger in humans than in rodents. With one 
obligate crossing-over per chromosome or chromosome 
arm (Pardo-Manuel de Villena & Sapienza 2001), varia- 
tion in number of chromosomes or number of chromo- 
some arms can explain at least part of the variation in 
total amount of recombination among species, as is the 
case in the comparison of primates and rodents. We 
found that the total amount of recombination in fly- 
catcher was 200% of that of zebra finch despite both 
species belonging to the same order of birds and their 
karyotypes probably being very similar. In contrast, the 
amount of recombination in flycatcher was similar to 
that in the more distantly related chicken (Groenen 
et al. 2009); chicken and flycatcher lineages diverged 
about 80 My ago (Nabholz et al. 2011). Our data thus 
point both at long-term conservation in the amount of 
avian recombination and that there can be relatively 
short-term changes. 

Domestication may select for increased recombination 
by favouring the generation of new haplotypes and 
new gene combinations in the face of drastically chan- 
ged selection pressures (Burt & Bell 1987; Ross-Ibarra 
2004). It has been hypothesized that this could explain 
the higher total amount of recombination in chicken 
(and turkey, which appears similar to chicken with 
respect to recombination, Aslam et al. 2010) than in 
zebra finch (Backstrom et al. 2010a; van Oers et al. 
2014). However, with similar recombination rates in 



chicken and flycatcher, our data do not support this 
hypothesis. If anything, the fact that zebra finch linkage 
map data come from birds held in captivity for many 
generations does not support increased recombination 
as a response to artificial selection. It has also been sug- 
gested that passerine birds would have lower recombi- 
nation rates than galliforms (van Oers et al. 2014), a 
view tentatively supported by low-density linkage map 
data from some species (Akesson et al. 2007; Hansson 
et al. 2009; Jaari et al. 2009; van Oers et al. 2014). This 
might be true although it was not supported by our 
data as both flycatcher and zebra finch belong to the 
order Passeriformes. In general, we caution against tak- 
ing interpretations from low-density linkage maps of 
species without an assembled genome sequence too far. 
With increased recombination rate towards chromo- 
some ends, which might not necessarily be covered in 
linkage maps based on random markers, and by an 
additional 50 cM added to the total map length for 
every inclusion of another microchromosome, low-den- 
sity linkage maps may grossly underestimate the total 
amount of recombination. 

Although homologous chromosomes of flycatcher and 
chicken are differently organized due to inversions, the 
broad-scale recombination landscape in these two dis- 
tantly related birds was similar (Fig. 8). This stands in 
sharp contrast to the recombination landscape in zebra 
finch. Large zebra finch chromosomes are characterized 
by the presence of extensive recombination desserts 
spanning the most of the interior parts of these chromo- 
somes, not seen in flycatcher (Stapley et al. 2008; Backs- 
trom et al. 2010a). As much as 80% of the total amount of 
recombination is concentrated on the 20% distal parts of 
several large chromosomes. Although there is an increase 
in recombination rate towards the ends of chromosomes 
in flycatcher and chicken, this effect is far from as 
dramatic as in zebra finch (Supporting information 
Table S4). As far as we aware of, it is not known what 
factors may affect differences in the distribution of cross- 
ing-over events along chromosomes in birds. It could be 
noted that the Prdm9 gene, which plays a critical role in 
mammalian recombination, appears to be lacking in 
avian genomes (Pouting 2011). 

Perspectives and conclusions 

Developments in the use of genetic approaches for 
addressing ecological and evolutionary questions in 
Ficedula flycatchers well illustrate the overall develop- 
ments in the field of molecular ecology and provide a 
timehne for its progress. For the Ficedula system, this 
began about 25 years ago with the analysis of 
allozymes and restriction fragment length polymor- 
phisms of pied flycatcher mtDNA (Gelter et al. 1989; 
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Tegelstrom & Gelter 1990), and was soon followed by 
the introduction of microsatellite (EUegren 1991, 1992) 
and DNA fingerprinting markers (Gelter & Tegelstrom 
1992; Ratti et al. 1995). Questions at this time were 
mainly related to behavioural ecology, like the fitness 
return of EPF, but also focused on speciation and 
hybridization. It also included tests of sex allocation 
theory, using PCR-based approaches for molecular sex- 
ing (EUegren et al. 1996; Sheldon & EUegren 1996). 
DNA sequencing of flycatcher mtDNA came into use 
around year 2000 (Saetre et al. 2001) and provided 
phylogenetic perspectives and increased resolution for 
the detection of hybridization between flycatcher spe- 
cies. This was subsequently augmented with the use 
of nuclear single-nucleotide polymorphisms, SNPs 
(Primmer et al. 2002), and opened a venue for study- 
ing gene flow, introgression and population structure 
in further detaU (Saetre et al. 2003; Borge et al. 2005; 
Lehtonen et al. 2009). In the mid of the last decade, 
genetic mapping efforts of flycatchers began and intro- 
duced a genomic perspective, although the genome 
itself was considered only far away in the horizon 
(Backstrom et al. 2006, 2008, 2010b). However, the tre- 
mendous power offered by next-generation sequencing 
technology meant that a draft sequence of the collared 
flycatcher genome could be presented in 2012 (EUegren 
et al. 2012), providing novel insights into genomic 
divergence during lineage splitting. Moreover, this 
provided a platform for genome-wide studies of gene 
expression (Uebbing et al. 2013) and flycatcher popula- 
tion history (Nadachowska-Brzyska et al. 2013). Fur- 
thermore, recent proteomic analysis suggests that 
functional genomic studies are on their way (Leskinen 
et al. 2012). All in all, this makes Ficedula flycatcher a 
prime model organism in molecular ecology. 

This study provides a genome assembly with nearly 
continuous chromosome sequences and a detailed 
genetic map of the flycatcher genome. Together, this 
information allowed us to conclude that the highly con- 
served nature of the avian karyotype stands in sharp 
contrast to the observation of frequent intrachromoso- 
mal rearrangements during avian evolution. The rate of 
these rearrangements is high enough for inversions to 
potentially have been involved with many events of 
speciation. We envision that this idea could be tested 
by mapping inversion events onto a phylogeny of clo- 
sely related species of birds. Moreover, we found signif- 
icant variation in the rate of recombination across the 
genome and concluded that the large effect of chromo- 
some size could mainly be explained by obligate recom- 
bination events per chromosome. Surprisingly, the 
overall level and patterns of flycatcher recombination 
were more similar to chicken than to the more closely 
related zebra finch. 



We recently showed that the genomic landscape of 
divergence between pied and collared flycatchers is 
characterized by the presence on numerous 'differentia- 
tion islands', with markedly higher Fst than in the 
genomic background (EUegren et al. 2012). This observa- 
tion was based on data from whole-genome resequenc- 
ing of a relatively limited number of individuals of 
each species. A direction that now should be taken is to 
sequence multiple sympatric and allopatric populations 
of both species, and with these, data integrate informa- 
tion on recombination rate variation across the genome. 
This would allow addressing whether differentiation 
islands in multiple population comparisons coincide 
with recombination cold-spot regions. Such association 
would be compatible with a scenario of incidental 
islands where selection at linked sites locally reduces 
the effective population size and thereby enhances the 
rate of lineage sorting. Essentially, this means distin- 
guishing a scenario of genomic islands of speciation 
from a scenario of genomic islands and speciation, quot- 
ing Turner & Hahn (2010). 
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